Goto

Collaborating Authors

 transductive linear bandit


Sequential Experimental Design for Transductive Linear Bandits

Neural Information Processing Systems

In this paper we introduce the pure exploration transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $\delta$, and an unknown vector $\theta^{\ast}\in \mathbb{R}^d$, the goal is to infer $\arg\max_{z\in \mathcal{Z}} z^\top\theta^\ast$ with probability $1-\delta$ by making as few sequentially chosen noisy measurements of the form $x^\top\theta^{\ast}$ as possible. When $\mathcal{X}=\mathcal{Z}$, this setting generalizes linear bandits, and when $\mathcal{X}$ is the standard basis vectors and $\mathcal{Z}\subset \{0,1\}^d$, combinatorial bandits. The transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $\mathcal{X}$ a user is queried about may be restricted to known best-sellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$. In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we present the first non-asymptotic algorithm for linear bandits that nearly achieves the information-theoretic lower bound.


Sequential Experimental Design for Transductive Linear Bandits

Neural Information Processing Systems

In this paper we introduce the pure exploration transductive linear bandit problem: given a set of measurement vectors \mathcal{X}\subset \mathbb{R} d, a set of items \mathcal{Z}\subset \mathbb{R} d, a fixed confidence \delta, and an unknown vector \theta {\ast}\in \mathbb{R} d, the goal is to infer \arg\max_{z\in \mathcal{Z}} z \top\theta \ast with probability 1-\delta by making as few sequentially chosen noisy measurements of the form x \top\theta {\ast} as possible. The transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages \mathcal{X} a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages \mathcal{Z} that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books \mathcal{X} a user is queried about may be restricted to known best-sellers even though the goal might be to recommend more esoteric titles \mathcal{Z} . In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation.


An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

Nakamura, Shintaro, Sugiyama, Masashi

arXiv.org Artificial Intelligence

We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. Existing methods in the R-CPE-MAB and transductive linear bandits have a gap of problem-dependent constant terms and logarithmic terms between the upper and lower bounds of the sample complexity, respectively. We close these gaps by proposing an algorithm named the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound. Finally, we numerically show that the CombGapE algorithm outperforms existing methods significantly.


Sequential Experimental Design for Transductive Linear Bandits

Fiez, Tanner, Jain, Lalit, Jamieson, Kevin G., Ratliff, Lillian

Neural Information Processing Systems

In this paper we introduce the pure exploration transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R} d$, a set of items $\mathcal{Z}\subset \mathbb{R} d$, a fixed confidence $\delta$, and an unknown vector $\theta {\ast}\in \mathbb{R} d$, the goal is to infer $\arg\max_{z\in \mathcal{Z}} z \top\theta \ast$ with probability $1-\delta$ by making as few sequentially chosen noisy measurements of the form $x \top\theta {\ast}$ as possible. When $\mathcal{X} \mathcal{Z}$, this setting generalizes linear bandits, and when $\mathcal{X}$ is the standard basis vectors and $\mathcal{Z}\subset \{0,1\} d$, combinatorial bandits. The transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $\mathcal{X}$ a user is queried about may be restricted to known best-sellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$.